An analysis of the stop and frisk policy in NYC from 2017 - 2019 using R.
# Packages
library(tidyverse) # Easily Install and Load the 'Tidyverse', CRAN v1.3.0
library(patchwork) # The Composer of Plots, CRAN v1.1.1
library(here) # A Simpler Way to Find Your Files, CRAN v1.0.1
library(janitor) # Simple Tools for Examining and Cleaning Dirty Data, CRAN v2.1.0
library(tidytext) # Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools, CRAN v0.3.0
library(textdata) # Download and Load Various Text Datasets, CRAN v0.4.1
library(vader) # Valence Aware Dictionary and sEntiment Reasoner (VADER), CRAN v0.2.1
library(readxl) # Read Excel Files, CRAN v1.3.1
library(lubridate) # Make Dealing with Dates a Little Easier, CRAN v1.7.10
library(gt) # Easily Create Presentation-Ready Display Tables, CRAN v0.2.2
library(kableExtra) # Construct Complex Table with 'kable' and Pipe Syntax, CRAN v1.3.4
library(knitr) # A General-Purpose Package for Dynamic Report Generation in R, CRAN v1.31
library(plotly) # Create Interactive Web Graphics via 'plotly.js', CRAN v4.9.3
This article was originally completed in python and can be found here: link
In mid 2020, protests took place all over the USA and the world in response to the murder of George Floyd by the Minneapolis Police. Since the advent of video recordings via cell phones, there have been many instances of police brutality captured against Black people. The relationship between law enforcement and the Black community has always been tenuous and now it has been brought to the attention of the rest of the world. In the landmark Supreme court case of Terry v. Ohio (1968) it was ruled that police could stop, question, and frisk a person if they have reasonable suspicion that the person had committed a crime (Brandes, S.A. et al., 2019).
In the 2000’s to the early 2010’s the New York City (NYC) stop and frisk policy garnered national attention due to the high number of stops and profiling of Black people. At the height of the policy, in 2011 there were 658,724 stops recorded with over 50% of the stops targeting Black people (NYCLU 2019). Since then, the number of stops per year has substantially decreased to 13,459 stop in 2019. Opponents of this policy argue this is still too many stops, especially since in 2019 about 66% of the people stopped were innocent. It has been also shown that the stopping of white people more likely led to an arrest in comparison to Black and Hispanic people, implying the police may be targeting minorities and being more mindful of stopping white people (Gelman, A., et. al., 2007). There has been substantial research conducted showing the psychological distress of a stop and frisk policy on communities of color in NYC (Sewell, A. et al., 2016).
In this project I analyze the Stop, Question and Frisk Data from the New York Police Department (NYPD) from the most current three years: 2017, 2018, and 2019 (NYC Stop and Frisk Data). I chose these years for the following reasons: The years 2018-2019 was not included in the most recent NYCLU report. In 2017 the NYPD moved to an electronic form, as opposed to manually writing down a response for each question in the handwritten forms used prior to 2017. Lastly 2017 was the first year of the Trump presidency and I was curious to investigate if his rhetoric on race may have affected law enforcement’s behavior toward minorities.
Data was downloaded from the NYPD website link. In each dataset each row is a stop of a specific person, and each column is a variable. There are a total of 83 different variables in each dataset.Below is a table of the first 10 stops from this dataset.
# Read in data
saf_17 <- read_excel(here("_texts", "stop_frisk_R", "data",
"sqf_2017.xlsx"))
saf_18 <- read_excel(here("_texts", "stop_frisk_R", "data",
"sqf_2018.xlsx"))
saf_19 <- read_excel(here("_texts", "stop_frisk_R", "data",
"sqf_2019.xlsx"))
# fix date column and select columns to be used for analysis
saf_17 <- saf_17 %>%
mutate(STOP_FRISK_DATE = ymd(STOP_FRISK_DATE)) %>%
select(
c(
"STOP_FRISK_DATE",
"STOP_FRISK_TIME",
"YEAR2",
"MONTH2",
"DAY2",
"ISSUING_OFFICER_RANK",
"SUSPECTED_CRIME_DESCRIPTION",
"SUSPECT_ARRESTED_FLAG",
"SUSPECT_ARREST_OFFENSE",
"OFFICER_IN_UNIFORM_FLAG",
"FRISKED_FLAG",
"SEARCHED_FLAG",
"OTHER_CONTRABAND_FLAG",
"FIREARM_FLAG",
"KNIFE_CUTTER_FLAG",
"OTHER_WEAPON_FLAG",
"WEAPON_FOUND_FLAG",
"PHYSICAL_FORCE_CEW_FLAG",
"PHYSICAL_FORCE_DRAW_POINT_FIREARM_FLAG",
"PHYSICAL_FORCE_HANDCUFF_SUSPECT_FLAG",
"PHYSICAL_FORCE_OC_SPRAY_USED_FLAG",
"PHYSICAL_FORCE_OTHER_FLAG",
"PHYSICAL_FORCE_RESTRAINT_USED_FLAG",
"PHYSICAL_FORCE_VERBAL_INSTRUCTION_FLAG",
"PHYSICAL_FORCE_WEAPON_IMPACT_FLAG",
"SUSPECTS_ACTIONS_CASING_FLAG",
"SUSPECTS_ACTIONS_PROXIMITY_TO_SCENE_FLAG",
"DEMEANOR_OF_PERSON_STOPPED",
"SUSPECT_REPORTED_AGE",
"SUSPECT_SEX",
"SUSPECT_RACE_DESCRIPTION",
"SUSPECT_BODY_BUILD_TYPE",
"SUSPECT_OTHER_DESCRIPTION",
"STOP_LOCATION_PRECINCT",
"STOP_LOCATION_FULL_ADDRESS",
"STOP_LOCATION_STREET_NAME",
"STOP_LOCATION_PATROL_BORO_NAME",
"STOP_LOCATION_BORO_NAME",
"SUSPECTS_ACTIONS_DRUG_TRANSACTIONS_FLAG"
)
)
saf_18 <- saf_18 %>%
mutate(STOP_FRISK_DATE = ymd(STOP_FRISK_DATE)) %>%
select(
c(
"STOP_FRISK_DATE",
"STOP_FRISK_TIME",
"YEAR2",
"MONTH2",
"DAY2",
"ISSUING_OFFICER_RANK",
"SUSPECTED_CRIME_DESCRIPTION",
"SUSPECT_ARRESTED_FLAG",
"SUSPECT_ARREST_OFFENSE",
"OFFICER_IN_UNIFORM_FLAG",
"FRISKED_FLAG",
"SEARCHED_FLAG",
"OTHER_CONTRABAND_FLAG",
"FIREARM_FLAG",
"KNIFE_CUTTER_FLAG",
"OTHER_WEAPON_FLAG",
"WEAPON_FOUND_FLAG",
"PHYSICAL_FORCE_CEW_FLAG",
"PHYSICAL_FORCE_DRAW_POINT_FIREARM_FLAG",
"PHYSICAL_FORCE_HANDCUFF_SUSPECT_FLAG",
"PHYSICAL_FORCE_OC_SPRAY_USED_FLAG",
"PHYSICAL_FORCE_OTHER_FLAG",
"PHYSICAL_FORCE_RESTRAINT_USED_FLAG",
"PHYSICAL_FORCE_VERBAL_INSTRUCTION_FLAG",
"PHYSICAL_FORCE_WEAPON_IMPACT_FLAG",
"SUSPECTS_ACTIONS_CASING_FLAG",
"SUSPECTS_ACTIONS_PROXIMITY_TO_SCENE_FLAG",
"DEMEANOR_OF_PERSON_STOPPED",
"SUSPECT_REPORTED_AGE",
"SUSPECT_SEX",
"SUSPECT_RACE_DESCRIPTION",
"SUSPECT_BODY_BUILD_TYPE",
"SUSPECT_OTHER_DESCRIPTION",
"STOP_LOCATION_PRECINCT",
"STOP_LOCATION_FULL_ADDRESS",
"STOP_LOCATION_STREET_NAME",
"STOP_LOCATION_PATROL_BORO_NAME",
"STOP_LOCATION_BORO_NAME",
"SUSPECTS_ACTIONS_DRUG_TRANSACTIONS_FLAG"
)
)
saf_19 <- saf_19 %>%
mutate(STOP_FRISK_DATE = ymd(STOP_FRISK_DATE)) %>%
select( # select for the years
c(
"STOP_FRISK_DATE",
"STOP_FRISK_TIME",
"YEAR2",
"MONTH2",
"DAY2",
"ISSUING_OFFICER_RANK",
"SUSPECTED_CRIME_DESCRIPTION",
"SUSPECT_ARRESTED_FLAG",
"SUSPECT_ARREST_OFFENSE",
"OFFICER_IN_UNIFORM_FLAG",
"FRISKED_FLAG",
"SEARCHED_FLAG",
"OTHER_CONTRABAND_FLAG",
"FIREARM_FLAG",
"KNIFE_CUTTER_FLAG",
"OTHER_WEAPON_FLAG",
"WEAPON_FOUND_FLAG",
"PHYSICAL_FORCE_CEW_FLAG",
"PHYSICAL_FORCE_DRAW_POINT_FIREARM_FLAG",
"PHYSICAL_FORCE_HANDCUFF_SUSPECT_FLAG",
"PHYSICAL_FORCE_OC_SPRAY_USED_FLAG",
"PHYSICAL_FORCE_OTHER_FLAG",
"PHYSICAL_FORCE_RESTRAINT_USED_FLAG",
"PHYSICAL_FORCE_VERBAL_INSTRUCTION_FLAG",
"PHYSICAL_FORCE_WEAPON_IMPACT_FLAG",
"SUSPECTS_ACTIONS_CASING_FLAG",
"SUSPECTS_ACTIONS_PROXIMITY_TO_SCENE_FLAG",
"DEMEANOR_OF_PERSON_STOPPED",
"SUSPECT_REPORTED_AGE",
"SUSPECT_SEX",
"SUSPECT_RACE_DESCRIPTION",
"SUSPECT_BODY_BUILD_TYPE",
"SUSPECT_OTHER_DESCRIPTION",
"STOP_LOCATION_PRECINCT",
"STOP_LOCATION_FULL_ADDRESS",
"STOP_LOCATION_STREET_NAME",
"STOP_LOCATION_PATROL_BORO_NAME",
"STOP_LOCATION_BORO_NAME",
"SUSPECTS_ACTIONS_DRUG_TRANSACTIONS_FLAG"
)
)
# row bind all 3 years into one data frame
saf <- rbind(saf_17, saf_18, saf_19)
# clean column names
saf <- saf %>%
clean_names()
# view first 5 rows
head(saf, n = 10) %>%
kbl(caption = "<b style = 'color:black;'>
First ten rows of stop and frisk dataset.") %>%
kable_material_dark(bootstrap_options = c("striped", "hover")) %>%
row_spec(0, color = "white", background = "#222222") %>%
scroll_box(width = "100%", height = "300px",
fixed_thead = list(enabled = T, background = "#222222"))
| stop_frisk_date | stop_frisk_time | year2 | month2 | day2 | issuing_officer_rank | suspected_crime_description | suspect_arrested_flag | suspect_arrest_offense | officer_in_uniform_flag | frisked_flag | searched_flag | other_contraband_flag | firearm_flag | knife_cutter_flag | other_weapon_flag | weapon_found_flag | physical_force_cew_flag | physical_force_draw_point_firearm_flag | physical_force_handcuff_suspect_flag | physical_force_oc_spray_used_flag | physical_force_other_flag | physical_force_restraint_used_flag | physical_force_verbal_instruction_flag | physical_force_weapon_impact_flag | suspects_actions_casing_flag | suspects_actions_proximity_to_scene_flag | demeanor_of_person_stopped | suspect_reported_age | suspect_sex | suspect_race_description | suspect_body_build_type | suspect_other_description | stop_location_precinct | stop_location_full_address | stop_location_street_name | stop_location_patrol_boro_name | stop_location_boro_name | suspects_actions_drug_transactions_flag |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2017-01-16 | 1899-12-31 14:26:00 | 2017 | January | Monday | SGT | TERRORISM | N | (null) | Y | N | N | N | (null) | (null) | (null) | N | (null) | (null) | (null) | (null) | (null) | (null) | Y | (null) | (null) | (null) | TERRORISM | 39 | MALE | (null) | THN | (null) | 1 | 180 GREENWICH STREET | GREENWICH STREET | PBMS | MANHATTAN | (null) |
| 2017-01-16 | 1899-12-31 14:26:00 | 2017 | January | Monday | SGT | TERRORISM | N | (null) | Y | N | N | N | (null) | (null) | (null) | N | (null) | (null) | (null) | (null) | (null) | (null) | Y | (null) | (null) | (null) | TERRORISM | 37 | MALE | (null) | MED | (null) | 1 | 180 GREENWICH STREET | GREENWICH STREET | PBMS | MANHATTAN | (null) |
| 2017-02-08 | 1899-12-31 11:10:00 | 2017 | February | Wednesday | POM | OTHER | N | (null) | N | N | N | N | (null) | (null) | (null) | N | (null) | (null) | (null) | (null) | (null) | (null) | Y | (null) | (null) | (null) | OTHER | (null) | FEMALE | WHITE | THN | N/A | 1 | WALL STREET && BROADWAY | WALL STREET | PBMS | MANHATTAN | (null) |
| 2017-02-20 | 1899-12-31 11:35:00 | 2017 | February | Monday | POM | GRAND LARCENY AUTO | N | (null) | Y | Y | Y | N | (null) | (null) | (null) | N | (null) | (null) | Y | (null) | (null) | (null) | (null) | (null) | (null) | (null) | GRAND LARCENY AUTO | 31 | MALE | BLACK HISPANIC | U | UNK | 1 | 75 GREENE STREET | GREENE STREET | PBMS | MANHATTAN | (null) |
| 2017-02-21 | 1899-12-31 13:20:00 | 2017 | February | Tuesday | POM | BURGLARY | N | (null) | Y | N | N | N | (null) | (null) | (null) | N | (null) | (null) | Y | (null) | (null) | (null) | (null) | (null) | (null) | Y | BURGLARY | (null) | FEMALE | BLACK | THN | (null) | 1 | 429 WEST BROADWAY | WEST BROADWAY | PBMS | MANHATTAN | (null) |
| 2017-02-17 | 1899-12-31 21:25:00 | 2017 | February | Friday | POM | CPW | Y | MENACING | Y | Y | Y | N | (null) | (null) | (null) | N | (null) | (null) | (null) | (null) | (null) | (null) | Y | (null) | (null) | (null) | CPW | 39 | MALE | WHITE HISPANIC | MED | (null) | 1 | WEST STREET && CHAMBERS STREET | WEST STREET | PBMS | MANHATTAN | (null) |
| 2017-02-25 | 1899-12-31 20:00:00 | 2017 | February | Saturday | POM | CPW | N | (null) | Y | N | N | N | (null) | (null) | (null) | N | (null) | (null) | Y | (null) | (null) | Y | Y | (null) | (null) | (null) | CPW | 19 | FEMALE | BLACK | THN | (null) | 1 | CHAMBERS STREET && WEST BROADWAY | CHAMBERS STREET | PBMS | MANHATTAN | (null) |
| 2017-02-25 | 1899-12-31 19:58:00 | 2017 | February | Saturday | POM | CPW | N | (null) | Y | Y | Y | N | (null) | (null) | (null) | N | (null) | (null) | (null) | (null) | (null) | Y | Y | (null) | (null) | Y | NORMAL | 15 | FEMALE | BLACK | THN | (null) | 1 | CHAMBERS STREET && WEST BROADWAY | CHAMBERS STREET | PBMS | MANHATTAN | (null) |
| 2017-02-21 | 1899-12-31 13:15:00 | 2017 | February | Tuesday | POM | BURGLARY | N | (null) | Y | Y | Y | N | (null) | (null) | (null) | N | (null) | (null) | Y | (null) | (null) | (null) | Y | (null) | (null) | Y | PLEASANT | 43 | MALE | BLACK | HEA | (null) | 1 | 429 WEST BROADWAY | WEST BROADWAY | PBMS | MANHATTAN | (null) |
| 2017-03-03 | 1899-12-31 08:16:00 | 2017 | March | Friday | POM | CRIMINAL MISCHIEF | Y | CRIMINAL POSSESSION OF CONTROLLED SUBSTANCE | Y | Y | Y | Y | (null) | (null) | (null) | N | (null) | (null) | (null) | (null) | (null) | (null) | Y | (null) | (null) | (null) | NERVOUS | 33 | MALE | BLACK | THN | (null) | 1 | CORTLANDT STREET && CHURCH STREET | CORTLANDT STREET | PBMS | MANHATTAN | (null) |
nrow(saf)
[1] 36096
saf %>%
count(suspect_race_description) %>%
summarise(race = suspect_race_description,
n = n,
percent = n / sum(n)) %>%
gt() %>%
tab_header(
title = "Which Race was Stop and Frisked the Most?",
subtitle = "Data from 2017-2019"
) %>%
fmt_percent(
columns = vars(percent),
decimals = 2
) %>%
data_color(
columns = vars(n, percent),
colors = scales::col_numeric(
palette = c(
"lightskyblue", "dodgerblue", "royalblue4") ,
domain = NULL
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'))
),
locations = cells_body(
columns = 1:3
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'),
weight = "bold")
),
locations = cells_column_labels(
columns = 1:3
)
)
| Which Race was Stop and Frisked the Most? | ||
|---|---|---|
| Data from 2017-2019 | ||
| race | n | percent |
| (null) | 420 | 1.16% |
| AMER IND | 9 | 0.02% |
| AMERICAN INDIAN/ALASKAN N | 8 | 0.02% |
| AMERICAN INDIAN/ALASKAN NATIVE | 16 | 0.04% |
| ASIAN / PACIFIC ISLANDER | 522 | 1.45% |
| ASIAN/PAC.ISL | 206 | 0.57% |
| BLACK | 20817 | 57.67% |
| BLACK HISPANIC | 3102 | 8.59% |
| MALE | 7 | 0.02% |
| WHITE | 3266 | 9.05% |
| WHITE HISPANIC | 7723 | 21.40% |
In the above table we see that the people stopped were mostly described as have one of the following four races: Black, White Hispanic, White, or Black Hispanic.
saf_sub <- saf %>%
# filter for race in one of the four most saf
filter(suspect_race_description %in% c("BLACK",
"WHITE",
"WHITE HISPANIC",
"BLACK HISPANIC")) %>%
# Fill in NA values for nonsense descriptions
# also change misspellings
mutate(suspect_other_description = case_when(
suspect_other_description == "(null)" ~ NA_character_,
suspect_other_description == "NONE" ~ NA_character_,
suspect_other_description == "UNK" ~ NA_character_,
suspect_other_description == "UNKNOWN" ~ NA_character_,
suspect_other_description == "UNKNOW" ~ NA_character_,
suspect_other_description == "NO" ~ NA_character_,
suspect_other_description == "UKNOWN" ~ NA_character_,
suspect_other_description == "UNKOWN" ~ NA_character_,
suspect_other_description == "NA" ~ NA_character_,
suspect_other_description == "N/A" ~ NA_character_,
suspect_other_description == "0" ~ NA_character_,
suspect_other_description == 0 ~ NA_character_,
suspect_other_description == "TATOOS" ~ "TATTOOS",
suspect_other_description == "TATTOS" ~ "TATTOOS",
suspect_other_description == "TATOO" ~ "TATTOOS",
TRUE ~ as.character(suspect_other_description)
)) %>%
# Change unknown sex values to NA
mutate(suspect_sex = case_when(
suspect_sex == "(null)" ~ NA_character_,
suspect_sex == "19" ~ NA_character_,
suspect_sex == "23" ~ NA_character_,
suspect_sex == "24" ~ NA_character_,
suspect_sex == "39" ~ NA_character_,
TRUE ~ as.character(suspect_sex)
)) %>%
# Change (null) value to No (N) (need to find a better way to do this step)
mutate(physical_force_cew_flag = case_when(
physical_force_cew_flag == "(null)" ~ "N",
TRUE ~ as.character(physical_force_cew_flag)
)) %>%
mutate(physical_force_draw_point_firearm_flag = case_when(
physical_force_draw_point_firearm_flag == "(null)" ~ "N",
TRUE ~ as.character(physical_force_draw_point_firearm_flag)
)) %>%
mutate(physical_force_handcuff_suspect_flag = case_when(
physical_force_handcuff_suspect_flag == "(null)" ~ "N",
TRUE ~ as.character(physical_force_handcuff_suspect_flag)
)) %>%
mutate(physical_force_oc_spray_used_flag = case_when(
physical_force_oc_spray_used_flag == "(null)" ~ "N",
TRUE ~ as.character(physical_force_oc_spray_used_flag)
)) %>%
mutate(physical_force_other_flag = case_when(
physical_force_other_flag == "(null)" ~ "N",
TRUE ~ as.character(physical_force_other_flag)
)) %>%
mutate(physical_force_restraint_used_flag = case_when(
physical_force_restraint_used_flag == "(null)" ~ "N",
TRUE ~ as.character(physical_force_restraint_used_flag)
)) %>%
mutate(physical_force_verbal_instruction_flag = case_when(
physical_force_verbal_instruction_flag == "(null)" ~ "N",
TRUE ~ as.character(physical_force_verbal_instruction_flag)
)) %>%
mutate(physical_force_weapon_impact_flag = case_when(
physical_force_weapon_impact_flag == "(null)" ~ "N",
TRUE ~ as.character(physical_force_weapon_impact_flag)
)) %>%
mutate(suspects_actions_drug_transactions_flag = case_when(
suspects_actions_drug_transactions_flag == "(null)" ~ "N",
TRUE ~ as.character(suspects_actions_drug_transactions_flag)
)) %>%
mutate(suspects_actions_proximity_to_scene_flag = case_when(
suspects_actions_proximity_to_scene_flag == "(null)" ~ "N",
TRUE ~ as.character(suspects_actions_proximity_to_scene_flag)
)) %>%
mutate(other_contraband_flag = case_when(
other_contraband_flag == "(null)" ~ "N",
TRUE ~ as.character(other_contraband_flag)
)) %>%
mutate(firearm_flag = case_when(
firearm_flag == "(null)" ~ "N",
TRUE ~ as.character(firearm_flag)
)) %>%
mutate(knife_cutter_flag = case_when(
knife_cutter_flag == "(null)" ~ "N",
TRUE ~ as.character(knife_cutter_flag)
)) %>%
mutate(weapon_found_flag = case_when(
weapon_found_flag == "(null)" ~ "N",
weapon_found_flag == "(" ~ "N",
TRUE ~ as.character(weapon_found_flag)
)) %>%
mutate(suspect_arrest_offense = case_when(
suspect_arrest_offense == "(null)" ~ "NO ARREST",
TRUE ~ as.character(suspect_arrest_offense)
)) %>%
mutate(stop_location_boro_name = case_when(
stop_location_boro_name == "(null)" ~ NA_character_,
stop_location_boro_name == "STATEN IS" ~ "STATEN ISLAND",
stop_location_boro_name == "PBBX" ~ "BRONX",
stop_location_boro_name == "PBBN" ~ "BROOKLYN",
stop_location_boro_name == "PBMN" ~ "MANHATTAN",
stop_location_boro_name == "0208760" ~ NA_character_,
stop_location_boro_name == "0190241" ~ NA_character_,
stop_location_boro_name == "0986759" ~ NA_character_,
stop_location_boro_name == "PBMS" ~ "MANHATTAN",
stop_location_boro_name == "0210334" ~ NA_character_,
stop_location_boro_name == "PBSI" ~ "STATEN ISLAND",
stop_location_boro_name == "0237177" ~ NA_character_,
stop_location_boro_name == "PBBS" ~ "BROOKLYN",
stop_location_boro_name == "0155070" ~ NA_character_,
stop_location_boro_name == "0208169" ~ NA_character_,
TRUE ~ as.character(stop_location_boro_name)
)) %>%
mutate(stop_location_precinct = case_when(
stop_location_precinct == "(null)" ~ NA_integer_,
stop_location_precinct == 208760 ~ NA_integer_,
TRUE ~ as.integer(stop_location_precinct)
)) %>%
mutate(demeanor_of_person_stopped = case_when(
demeanor_of_person_stopped == "IRRATE" ~ "IRATE",
demeanor_of_person_stopped == "1" ~ NA_character_,
demeanor_of_person_stopped == "NEVEVOUS" ~ "NERVOUS",
demeanor_of_person_stopped == 1 ~ NA_character_,
demeanor_of_person_stopped == "N/A" ~ NA_character_,
TRUE ~ as.character(demeanor_of_person_stopped)
)) %>%
mutate(suspect_reported_age = case_when(
suspect_reported_age == "(null)" ~ NA_character_,
suspect_reported_age == "0" ~ NA_character_,
suspect_reported_age == "1" ~ NA_character_,
TRUE ~ as.character(suspect_reported_age)
)) %>%
mutate(suspect_reported_age = as.integer(suspect_reported_age))
Due to the nature of the dataset, there had to be some data cleaning conducted in order to organize some of the columns:
For most of the columns that are flags, converted (null) or other values to N (No) values.
Changed non male or female values to NA in the suspect_sex column.
In the suspect_other_description, classified null of other values that were obviously null values to NA.
In the demeanor_of_person_stopped corrected some misspellings and classified some observations to NA if no demeanor was noted.
Organized the stop_location_boro_name so that it represents one of the 5 boroughs of NYC or NA if it was not clear how to classify the stop.
Changed the (null) values for the suspect_arrest_offense to “No Arrest”.
Finally, converted missing values in the stop_location_precinct column to NA since the precincts are numbered from 1 to 123.
The cleaned and updated dataset includes 34,784 observations ommiting the stops conducting on people that were not described as Black, White, Black Hispanic, or White Hispanic.
saf_sub %>%
count(suspect_race_description) %>%
ggplot(aes(x = n, y = fct_reorder(suspect_race_description, n))) +
geom_col(fill = "#7dc8c4") +
labs(x = "Number of Stops",
y = "Race",
title = "Number of Stops by Race") +
theme_minimal() +
theme(panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
axis.text = element_text(size = 11,
face = "bold"),
axis.title.x = element_text(size = 14,
face = "bold"),
axis.title.y = element_blank(),
panel.grid.major.x = element_line(color = "grey27"),
panel.grid.minor.x = element_line(color = "grey27"),
plot.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"))
saf_sub %>%
count(suspect_sex) %>%
drop_na() %>%
ggplot(aes(x = n, y = fct_reorder(suspect_sex, n))) +
geom_col(fill = "#7dc8c4") +
labs(x = "Number of Stops",
y = "Sex",
title = "Number of Stops by Sex") +
theme_minimal() +
theme(panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
axis.text = element_text(size = 11,
face = "bold"),
axis.title.x = element_text(size = 14,
face = "bold"),
axis.title.y = element_blank(),
panel.grid.major.x = element_line(color = "grey27"),
panel.grid.minor.x = element_line(color = "grey27"),
plot.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"))
d1 <- saf_sub %>%
count(day2) %>%
ggplot(aes(x = n, y = fct_reorder(day2, n))) +
geom_col(fill = "#7dc8c4") +
labs(x = "Number of Stops",
y = "Weekday") +
theme_minimal() +
theme(panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
axis.text = element_text(size = 10,
face = "bold"),
axis.title = element_blank(),
panel.grid.major.x = element_line(color = "grey27"),
panel.grid.minor.x = element_line(color = "grey27"),
plot.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"))
m1 <- saf_sub %>%
count(month2) %>%
ggplot(aes(x = n, y = fct_reorder(month2, n))) +
geom_col(fill = "#7dc8c4") +
labs(x = "Number of Stops",
y = "Month") +
theme_minimal() +
theme(panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
axis.text = element_text(size = 10,
face = "bold"),
axis.title = element_blank(),
panel.grid.major.x = element_line(color = "grey27"),
panel.grid.minor.x = element_line(color = "grey27"),
plot.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"))
y1 <- saf_sub %>%
count(year2) %>%
mutate(year = fct_reorder(factor(year2), n)) %>%
ggplot(aes(x = n, y = year)) +
geom_col(fill = "#7dc8c4") +
labs(x = "Number of Stops",
y = "Month") +
theme_minimal() +
theme(panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
axis.text = element_text(size = 12,
face = "bold"),
axis.title = element_blank(),
panel.grid.major.x = element_line(color = "grey27"),
panel.grid.minor.x = element_line(color = "grey27"),
plot.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"))
(d1 + m1) / y1 +
plot_annotation(title = "Number of Stops by Weekday, Month, and Year") &
theme(plot.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"),
strip.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"),
plot.title = element_text(face = "bold",
size = 14))
Over 5500 of the stops occurred on Saturdays, while less than 4000 stops occurred on Mondays.
The least number of stops were made in December, while the Spring months of March, April and May had the highest number of stops.
saf_sub %>%
drop_na(suspect_reported_age) %>%
filter(suspect_reported_age >= 10 &
suspect_reported_age <= 80) %>%
ggplot(aes(x = suspect_reported_age)) +
geom_histogram(fill = "#7dc8c4",color = "white",
binwidth = 5) +
scale_x_continuous(breaks = seq(10, 80, 10)) +
labs(x = "Age",
y = "Number of Stops",
title = "Number of Stops by Age") +
theme_minimal() +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
axis.text = element_text(size = 11,
face = "bold"),
axis.title.x = element_text(size = 14,
face = "bold"),
axis.title.y = element_blank(),
panel.grid.major.y = element_line(color = "grey27"),
panel.grid.minor.y = element_line(color = "grey27"),
plot.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"))
saf_sub %>%
count(stop_location_boro_name) %>%
drop_na() %>%
ggplot(aes(x = n, y = fct_reorder(stop_location_boro_name, n))) +
geom_col(fill = "#7dc8c4") +
labs(x = "Number of Stops",
y = "Borough",
title = "Number of Stops by Borough") +
theme_minimal() +
theme(panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
axis.text = element_text(size = 11,
face = "bold"),
axis.title.x = element_text(size = 14,
face = "bold"),
axis.title.y = element_blank(),
panel.grid.major.x = element_line(color = "grey27"),
panel.grid.minor.x = element_line(color = "grey27"),
plot.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"))
plotly1 <- saf_sub %>%
drop_na(suspect_reported_age, suspect_race_description) %>%
filter(suspect_reported_age >= 10 &
suspect_reported_age <= 80) %>%
count(suspect_race_description, suspect_reported_age) %>%
ggplot(aes(x = suspect_race_description, y = suspect_reported_age)) +
geom_tile(aes(fill = n)) +
scale_fill_viridis_c(option = "cividis") +
scale_y_continuous(breaks = seq(10, 80, 5)) +
labs(x = "Race") +
theme_minimal() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.text = element_text(size = 11,
face = "bold"),
axis.title.x = element_text(size = 14,
face = "bold"),
axis.title.y = element_blank(),
plot.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"))
ggplotly(plotly1)
saf_sub %>%
count(suspect_race_description, suspected_crime_description) %>%
pivot_wider(names_from = suspect_race_description,
values_from = n) %>%
gt() %>%
tab_header(
title = "Race and Alleged Crime Description",
subtitle = "Data from 2017-2019"
) %>%
data_color(
columns = vars(BLACK,`BLACK HISPANIC`,
WHITE,`WHITE HISPANIC`),
colors = scales::col_numeric(
palette = c(
"lightskyblue", "dodgerblue", "royalblue4") ,
domain = NULL
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'))
),
locations = cells_body(
columns = 1:5
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'),
weight = "bold")
),
locations = cells_column_labels(
columns = 1:5
)
) %>%
cols_label(
suspected_crime_description = "Crime"
)
| Race and Alleged Crime Description | ||||
|---|---|---|---|---|
| Data from 2017-2019 | ||||
| Crime | BLACK | BLACK HISPANIC | WHITE | WHITE HISPANIC |
| ASSAULT | 2592 | 417 | 342 | 1021 |
| AUTO STRIPPIG | 67 | 19 | 17 | 47 |
| BURGLARY | 1213 | 215 | 458 | 631 |
| CPSP | 96 | 19 | 26 | 54 |
| CPW | 6205 | 872 | 551 | 1948 |
| CRIMINAL MISCHIEF | 342 | 45 | 82 | 150 |
| CRIMINAL POSSESSION OF CONTROLLED SUBSTANCE | 124 | 10 | 69 | 62 |
| CRIMINAL POSSESSION OF FORGED INSTRUMENT | 34 | 4 | 3 | 6 |
| CRIMINAL POSSESSION OF MARIHUANA | 365 | 61 | 24 | 166 |
| CRIMINAL SALE OF CONTROLLED SUBSTANCE | 115 | 25 | 47 | 80 |
| CRIMINAL SALE OF MARIHUANA | 57 | 9 | 6 | 23 |
| CRIMINAL TRESPASS | 1020 | 160 | 202 | 482 |
| FORCIBLE TOUCHING | 59 | 7 | 16 | 35 |
| GRAND LARCENY | 895 | 152 | 115 | 273 |
| GRAND LARCENY AUTO | 577 | 94 | 136 | 271 |
| MAKING GRAFFITI | 36 | 15 | 54 | 68 |
| MENACING | 401 | 67 | 56 | 163 |
| MISD | 1 | 2 | NA | 1 |
| MISDEMEANOR | 1 | 2 | NA | 1 |
| MURDER | 46 | 7 | 4 | 21 |
| OTHER | 871 | 130 | 152 | 321 |
| PETIT LARCENY | 1870 | 216 | 449 | 607 |
| PROSTITUTION | 13 | 1 | 3 | 8 |
| RAPE | 43 | 8 | 8 | 15 |
| RECKLESS ENDANGERMENT | 142 | 18 | 7 | 44 |
| ROBBERY | 3415 | 500 | 290 | 1104 |
| TERRORISM | 7 | 1 | 17 | 3 |
| THEFT OF SERVICES | 96 | 8 | 25 | 30 |
| UNAUTHORIZED USE OF A VEHICLE | 114 | 18 | 106 | 88 |
| FELONY | NA | NA | 1 | NA |
saf_sub %>%
filter(suspected_crime_description == "CPW") %>%
count(suspect_race_description, weapon_found_flag) %>%
pivot_wider(names_from = suspect_race_description,
values_from = n) %>%
gt() %>%
tab_header(
title = "Race and Weapons for CPW",
subtitle = "Data from 2017-2019"
) %>%
data_color(
columns = vars(BLACK,`BLACK HISPANIC`,
WHITE,`WHITE HISPANIC`),
colors = scales::col_numeric(
palette = c(
"lightskyblue", "dodgerblue", "royalblue4") ,
domain = NULL
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'))
),
locations = cells_body(
columns = 1:5
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'),
weight = "bold")
),
locations = cells_column_labels(
columns = 1:5
)
) %>%
cols_label(
weapon_found_flag = "Weapon Found"
)
| Race and Weapons for CPW | ||||
|---|---|---|---|---|
| Data from 2017-2019 | ||||
| Weapon Found | BLACK | BLACK HISPANIC | WHITE | WHITE HISPANIC |
| N | 5046 | 701 | 425 | 1506 |
| Y | 1159 | 171 | 126 | 442 |
saf_sub %>%
count(suspect_race_description, frisked_flag) %>%
pivot_wider(names_from = suspect_race_description,
values_from = n) %>%
gt() %>%
tab_header(
title = "Race and Frisked",
subtitle = "Data from 2017-2019"
) %>%
data_color(
columns = vars(BLACK,`BLACK HISPANIC`,
WHITE,`WHITE HISPANIC`),
colors = scales::col_numeric(
palette = c(
"lightskyblue", "dodgerblue", "royalblue4") ,
domain = NULL
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'))
),
locations = cells_body(
columns = 1:5
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'),
weight = "bold")
),
locations = cells_column_labels(
columns = 1:5
)
) %>%
cols_label(
frisked_flag = "Frisked?"
)
| Race and Frisked | ||||
|---|---|---|---|---|
| Data from 2017-2019 | ||||
| Frisked? | BLACK | BLACK HISPANIC | WHITE | WHITE HISPANIC |
| N | 8231 | 1167 | 1808 | 3137 |
| Y | 12586 | 1935 | 1458 | 4586 |
saf_sub %>%
count(suspect_race_description, suspect_arrested_flag) %>%
pivot_wider(names_from = suspect_race_description,
values_from = n) %>%
gt() %>%
tab_header(
title = "Race and Arrest",
subtitle = "Data from 2017-2019"
) %>%
data_color(
columns = vars(BLACK,`BLACK HISPANIC`,
WHITE,`WHITE HISPANIC`),
colors = scales::col_numeric(
palette = c(
"lightskyblue", "dodgerblue", "royalblue4") ,
domain = NULL
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'))
),
locations = cells_body(
columns = 1:5
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'),
weight = "bold")
),
locations = cells_column_labels(
columns = 1:5
)
) %>%
cols_label(
suspect_arrested_flag = "Arrested?"
)
| Race and Arrest | ||||
|---|---|---|---|---|
| Data from 2017-2019 | ||||
| Arrested? | BLACK | BLACK HISPANIC | WHITE | WHITE HISPANIC |
| N | 14748 | 2165 | 2209 | 5179 |
| Y | 6069 | 937 | 1057 | 2544 |
In these electronic forms there are 2 columns where we can perform sentiment analysis. Sentiment analysis could be used to understand the emotional component of a text. Here I used VADER, a model used to measure the negative, positive, neutral and overall sentiment intensity of a text (Hutto, C.J. et. al., 2014). The first column I chose to run sentiment analysis on was the demeanor_of_person_stopped. Below is a list of the top 10 responses for the demeanor of a person stopped.
saf_sub %>%
count(demeanor_of_person_stopped, sort = TRUE) %>%
drop_na(demeanor_of_person_stopped) %>%
slice(1:10) %>%
gt() %>%
tab_header(
title = "Top 10 Descriptions of Demeanor of Person Stopped",
subtitle = "Data from 2017-2019"
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'))
),
locations = cells_body(
columns = 1:2
)
) %>%
tab_style(
style = list(
cell_text(size = "large",
font = google_font('Lato'),
weight = "bold")
),
locations = cells_column_labels(
columns = 1:2
)
) %>%
cols_label(
demeanor_of_person_stopped = "Demeanor"
)
| Top 10 Descriptions of Demeanor of Person Stopped | |
|---|---|
| Data from 2017-2019 | |
| Demeanor | n |
| CALM | 7280 |
| NERVOUS | 3384 |
| CPW | 1817 |
| UPSET | 1525 |
| ROBBERY | 1480 |
| NORMAL | 1453 |
| COOPERATIVE | 793 |
| PETIT LARCENY | 727 |
| COMPLIANT | 648 |
| ANGRY | 542 |
get_vader(("CALM"))[2:5]
compound pos neu neg
"0.318" "1" "0" "0"
get_vader(("NERVOUS"))[2:5]
compound pos neu neg
"-0.273" "0" "0" "1"
get_vader(("NORMAL"))[2:5]
compound pos neu neg
"0" "0" "1" "0"
demeanor_of_person_stopped. The scale for the compound (overall) score is a value between -1 to 1. The closer to -1 the more negative, while the closer to 1, the more positive the sentiment is. “CALM” had an overall score of 0.3183, “NERVOUS” had an overall score of -0.2732, and “NORMAL” had an overall score of 0.0. These scores make sense since calm is usually more positive, nervous is usually more negative, while normal is fairly neutral.# select columns for further analysis
saf_dem_cols <- saf_sub %>%
select(suspect_race_description,
suspect_sex,
frisked_flag,
suspect_arrested_flag,
demeanor_of_person_stopped)
# obtain vader scores
saf_dem_scores <-
vader_df(saf_sub$demeanor_of_person_stopped)[2:5]
# join data frames
saf_dem_join <- cbind(saf_dem_cols, saf_dem_scores)
# boxplot race, arrested
boxplot1 <- saf_dem_join %>%
drop_na(suspect_race_description,
compound) %>%
ggplot(aes(x = suspect_arrested_flag,
y = compound)) +
geom_boxplot(aes(fill = suspect_arrested_flag),
outlier.fill = "red",
position = "dodge2") +
facet_grid(.~suspect_race_description) +
scale_fill_manual(values = c("springgreen3", "tomato1")) +
labs(x = "Arrested",
fill = "Arrested") +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
axis.text = element_text(size = 11,
face = "bold"),
axis.title.x = element_text(size = 10),
panel.grid.major.y = element_line(color = "grey40"),
panel.grid.minor.y = element_line(color = "grey40"),
plot.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"),
strip.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"),
strip.text = element_text(size = 13,
face = "bold"),
legend.background = element_rect(colour = "#e7eaea",
fill = "#e7eaea"),
legend.title = element_blank(),
legend.text = element_text(size = 11,
face = "bold"))
ggplotly(boxplot1)
In the past 3 years NYPD has stopped young Black males most often. Over 50% of the total stops were of Black people and over 90% of the stops were of males. The sentiment analysis using the VADER model provided some insightful results. Why is there a difference between the boxplots of arrested and not arrested people for Black and Hispanic people but no differences for White people? Further analysis of the text from the `demeanor_of_person_stopped variable would be of interest. The differences in sex for all racial groups is also evident. Females are said to have more negative demeanor in comparison to males. This could be due to the small sample size of females that are stopped.
There are numerous avenues that could be researched for further analysis from this dataset. Some possible questions: When are people of different races stopped during the day? For each race, what proportion of stops lead to physical force? In the precincts that do not stop Black people the most, what is the racial makeup of that precinct neighborhood?
There are a few questions to address from this dataset. First, these are only the recorded stops that are reported by the NYPD. There could be numerous stops that are not recorded by an officer for various reasons that could alter the data. Also, this data is inputted by the officer and thus they are essentially the data collectors. It would be important to somehow validate some of these stops, possibly by looking through police body cam footage.
This analysis further shows the explicit and implicit bias police have towards Black people. It is imperative during this time we all evaluate our bias toward people of different races. Police should be held to the highest standards since they are the ones tasked with keeping our communities safe and are in positions of authority. Unfortunately research has shown that the sometimes invasive and frequent stops have had detrimental health effects on minorities in racially diverse communities (Sewell A.A. et. al., 2016). Research to uncover biases towards minorities is extremely important and should be further investigated.